AITopics | stochastic gradient hamiltonian monte carlo

Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation.

Add feedback

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Neural Information Processing SystemsNov-20-2025, 22:07:20 GMT

Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm.

deep gaussian process, inference, stochastic gradient hamiltonian monte carlo, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.65)
Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

Millard, Andrew, Zhao, Zheng, Murphy, Joshua, Maskell, Simon

arXiv.org Machine LearningMay-20-2025

Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) proposals into SMC, enabling efficient mini-batch based sampling. Our resulting SMCSGHMC algorithm outperforms standard stochastic gradient descent (SGD) and deep ensembles across image classification, out-of-distribution (OOD) detection, and transfer learning tasks. We further show that SMCSGHMC mitigates overfitting and improves calibration, providing a flexible, scalable pathway for converting pretrained neural networks into well-calibrated Bayesian models.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2505.11671

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Sweden > Östergötland County > Linköping (0.04)

Genre: Research Report (1.00)

Add feedback

Reviews: Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Neural Information Processing SystemsOct-7-2024, 09:07:30 GMT

Update after rebuttal: I think the rebuttal is fair. It is very reassuring that pseudocode will be provided to the readers. I therefore keep my decision unchanged. Original review: In the paper "Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo" the author(s) consider the problem of inference for deep gaussian processes (DGPs). Given the large number of layers and width of each layer, direct inference is computaitonal infeasible, which has motivated numerous variational inference methods to approximate the posterior distribution, for example doubly stochastic variational inference (DSVI) of [Salimbeni and Deisenroth, 2017] The authors argue that these unimodal approximations are typically poor given the multimodal and non-Gaussian nature of the posterior.

deep gaussian process, inference, stochastic gradient hamiltonian monte carlo, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Bayesian Optimization with Robust Bayesian Neural Networks

Neural Information Processing SystemsMar-12-2024, 15:58:45 GMT

Bayesian optimization is a prominent method for optimizing expensive-to-evaluate black-box functions that is widely applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation.

hyperparameter, neural network, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Havasi, Marton, Hernández-Lobato, José Miguel, Murillo-Fuentes, Juan José

Neural Information Processing SystemsFeb-14-2020, 19:55:53 GMT

Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples.

deep gaussian process, inference, stochastic gradient hamiltonian monte carlo, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization

Akyildiz, Ömer Deniz, Sabanis, Sotirios

arXiv.org Machine LearningFeb-13-2020

This problem arises in many cases in machine learning, most notably in large-scale (mini-batch) Bayesian inference (Welling and Teh, 2011, Ahn et al., 2012) and nonconvex stochastic optimization (Raginsky et al., 2017). For the setting of Bayesian inference, one is interested in sampling from a posterior probability measure where U corresponds to the sum of the log-likelihood and the log-prior. For the nonconvex optimization, U(·) is the nonconvex cost function to be minimized. For large values ofβ, a sample from the target measure (1) is an approximate minimizer of the potential U (Raginsky et al., 2017). Consequently, nonasymptotic error bounds for the schemes, which are designed to sample from (1), can be used to obtain guarantees for Bayesian inference or nonconvex optimization. Sampling from a measure of the form (1) is also central in statistical physics (Binder et al., 1993), most notably in molecular dynamics Haile (1992).

assumption 2, chau and rasonyi, theorem 2, (8 more...)

arXiv.org Machine Learning

2002.05465

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)

Add feedback

Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Learning in the Big Data Regime

Chau, Huy N., Rasonyi, Miklos

arXiv.org Machine LearningMar-25-2019

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a momentum version of stochastic gradient descent with properly injected Gaussian noise to find a global minimum. In this paper, non-asymptotic convergence analysis of SGHMC is given in the context of non-convex optimization, where subsampling techniques are used over an i.i.d dataset for gradient updates. Our results complement those of [RRT17] and improve on those of [GGZ18].

artificial intelligence, assumption 2, machine learning, (14 more...)

arXiv.org Machine Learning

1903.10328

Country: Europe (0.46)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling

Luo, Rui, Zhang, Qiang, Liu, Yuanyuan

arXiv.org Machine LearningDec-7-2018

We propose a new sampler that integrates the protocol of parallel tempering with the Nos\'e-Hoover (NH) dynamics. The proposed method can efficiently draw representative samples from complex posterior distributions with multiple isolated modes in the presence of noise arising from stochastic gradient. It potentially facilitates deep Bayesian learning on large datasets where complex multimodal posteriors and mini-batch gradient are encountered.

artificial intelligence, machine learning, replica, (11 more...)

arXiv.org Machine Learning

1812.01181

Country: Oceania > Australia (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

Predictive Uncertainty in Large Scale Classification using Dropout - Stochastic Gradient Hamiltonian Monte Carlo

Vergara, Diego, Hernández, Sergio, Valdenegro, Matías, Jorquera, Felipe

arXiv.org Machine LearningMay-12-2018

Abstract--Predictive uncertainty is crucial for many computer vision tasks, from image classification to autonomous driving systems. Hamiltonian Monte Carlo (HMC) is an inference method for sampling complex posterior distributions. On the other hand, Dropout regularization has been proposed as an approximate model averaging technique that tends to improve generalization in large scale models such as deep neural networks. Although, HMC provides convergence guarantees for most standard Bayesian models, it does not handle discrete parameters arising from Dropout regularization. In this paper, we present a robust methodology for predictive uncertainty in large scale classification problems, based on Dropout and Stochastic Gradient Hamiltonian Monte Carlo. Even though Dropout induces a non-smooth energy function with no such convergence guarantees, the resulting discretization of the Hamiltonian proves empirical success. The proposed method allows to effectively estimate predictive accuracy and to provide better generalization for difficult test examples.

artificial intelligence, d-sghmc, machine learning, (13 more...)

arXiv.org Machine Learning

1805.04756

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Filters

Collaborating Authors

stochastic gradient hamiltonian monte carlo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Bayesian Optimization with Robust Bayesian Neural Networks

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles

Reviews: Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Bayesian Optimization with Robust Bayesian Neural Networks

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization

Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Learning in the Big Data Regime

Parallel-tempered Stochastic Gradient Hamiltonian Monte Carlo for Approximate Multimodal Posterior Sampling

Predictive Uncertainty in Large Scale Classification using Dropout - Stochastic Gradient Hamiltonian Monte Carlo